Querying `Series`
A Pandas Series can be queried either by index position or index label. If an index is not specified during querying, the position and the label are effectively the same values.
Querying by Index Position
To query by numeric location (starting at zero), use the iloc attribute.
import pandas as pd
students_classes = {'Alice': 'Physics',
'Jack': 'Chemistry',
'Molly': 'English',
'Sam': 'History'}
s = pd.Series(students_classes)
# Querying the fourth entry using iloc
s.iloc[3]
Querying by Index Label
To query by the index label, use the loc attribute.
# Querying the class Molly is taking using loc
s.loc['Molly']
Using the Indexing Operator
Pandas provides a smart syntax using the indexing operator directly on the Series.
# Querying by integer parameter (acts like iloc)
s[3]
# Querying by object parameter (acts like loc)
s['Molly']
Indexing with Integer Labels
If the index is a list of integers, be explicit by using the iloc or loc attributes to avoid confusion.
class_code = {99: 'Physics',
100: 'Chemistry',
101: 'English',
102: 'History'}
s = pd.Series(class_code)
# This will result in a KeyError
s[0]
# Correct way to query the first item
s.iloc[0]
Working with Series Data
Iterating Over Series
A common task is to perform operations on all values in a Series.
grades = pd.Series([90, 80, 70, 60])
total = 0
for grade in grades:
total += grade
print(total / len(grades))
Vectorization
Vectorization allows for efficient computation using the NumPy library.
import numpy as np
# Using numpy's sum method
total = np.sum(grades)
print(total / len(grades))
Performance Comparison
Using the Jupyter Notebook's timeit magic function to compare performance.
numbers = pd.Series(np.random.randint(0, 1000, 10000))
# Iterative approach
%%timeit -n 100
total = 0
for number in numbers:
total += number
total / len(numbers)
# Vectorized approach
%%timeit -n 100
total = np.sum(numbers)
total / len(numbers)
Broadcasting
Broadcasting applies an operation to every value in the Series.
numbers.head()
numbers += 2
numbers.head()
Iterating with iteritems
Iterating through a Series using iteritems.
for label, value in numbers.iteritems():
numbers.iat[label] = value + 2
numbers.head()
Performance Comparison with Broadcasting
# Iterative approach
%%timeit -n 10
s = pd.Series(np.random.randint(0, 1000, 1000))
for label, value in s.iteritems():
s.loc[label] = value + 2
# Broadcasting approach
%%timeit -n 10
s = pd.Series(np.random.randint(0, 1000, 1000))
s += 2
Modifying Series with loc
The loc attribute can modify data in place or add new data.
s = pd.Series([1, 2, 3])
s.loc['History'] = 102
s
Merging Series
Creating Series with non-unique index values and merging them.
students_classes = pd.Series({'Alice': 'Physics',
'Jack': 'Chemistry',
'Molly': 'English',
'Sam': 'History'})
kelly_classes = pd.Series(['Philosophy', 'Arts', 'Math'], index=['Kelly', 'Kelly', 'Kelly'])
# Merging the Series
all_students_classes = students_classes.append(kelly_classes)
all_students_classes
Considerations with Append
- The
appendmethod returns a new Series. - It tries to infer the best data types.
students_classes
all_students_classes.loc['Kelly']